mysql delete语句使用where in不走索引的问题解决

mysql delete语句使用where in不走索引的问题解决
1、背景介绍：
想要删除某个表中重复行的数据
先使用聚合函数查询到哪些数据是重复的:
select update_time from test_data group by update_time having count(1) > 1;

2、根据update_time数据相同的值找到对应的id，根据主键id进行删除:
delete from test_data where id in (select id from (select min(id) as id,update_time from test_data group by update_time having count(1) > 1 ) as t);

然后查看这个sql的执行计划
explain delete from test_data where id in (select id from (select min(id) as id,update_time from test_data group by update_time having count(1) > 1 ) as t);

root@db 12:16:  [tmp_opa_user]> explain delete from test_data where id in (select id from (select min(id) as id,update_time from test_data  group by update_time having count(1) > 1 ) as t);
+------+--------------------+------------+----------------+---------------+-----------------+---------+------+---------+-------------+
| id   | select_type        | table      | type           | possible_keys | key             | key_len | ref  | rows    | Extra       |
+------+--------------------+------------+----------------+---------------+-----------------+---------+------+---------+-------------+
|    1 | PRIMARY            | test_data  | ALL            | NULL          | NULL            | NULL    | NULL | 1085134 | Using where |
|    2 | DEPENDENT SUBQUERY | <derived3> | index_subquery | key0          | key0            | 9       | func | 108513  |             |
|    3 | DERIVED            | test_data  | index          | NULL          | idx_update_time | 7       | NULL | 1085134 | Using index |
+------+--------------------+------------+----------------+---------------+-----------------+---------+------+---------+-------------+
3 rows in set (0.001 sec)

这种方法没有走主键索引，删除耗时:

1
2

root@db 12:27:  [tmp_opa_user]>  delete from test_data where id in (select id from (select min(id) as id,update_time from test_data  group by update_time having count(1) > 1 ) as t);
Query OK, 81241 rows affected (4.669 sec)

3、更改delete删除方法，使用delete inner join的方法:
sql如下:
explain delete a from test_data a inner join (select id from (select min(id) as id,update_time from test_data group by update_time having count(1) > 1 ) as t) d on d.id = a.id;

explain delete a from test_data a inner join  (select id from (select min(id) as id,update_time from test_data  group by update_time having count(1) > 1 ) as t)  d on d.id = a.id;
+------+-------------+------------+--------+---------------+-----------------+---------+------+---------+-------------+
| id   | select_type | table      | type   | possible_keys | key             | key_len | ref  | rows    | Extra       |
+------+-------------+------------+--------+---------------+-----------------+---------+------+---------+-------------+
|    1 | PRIMARY     | <derived2> | ALL    | NULL          | NULL            | NULL    | NULL | 1003893 | Using where |
|    1 | PRIMARY     | a          | eq_ref | PRIMARY       | PRIMARY         | 8       | d.id | 1       |             |
|    2 | DERIVED     | <derived3> | ALL    | NULL          | NULL            | NULL    | NULL | 1003893 |             |
|    3 | DERIVED     | test_data  | index  | NULL          | idx_update_time | 7       | NULL | 1003893 | Using index |
+------+-------------+------------+--------+---------------+-----------------+---------+------+---------+-------------+
4 rows in set (0.000 sec)

执行计划提示1执行删除的时候使用了索引:

1
2

root@db 12:22:  [tmp_opa_user]> delete a from test_data a inner join  (select id from (select min(id) as id,update_time from test_data  group by update_time having count(1) > 1 ) as t)  d on d.id = a.id; 
Query OK, 81241 rows affected (4.120 sec)

删除8万条数据执行了4.1s

4、以上两种方法，3的方法会比2的方法快，尤其是在根据二级索引删除的时候，方法3比方法2提升N个档次

1	delete from test_data where update_time in (select update_time from (select update_time from test_data group by update_time having count(1) > 1 limit 1000) as t);

1	delete a from test_data a inner join (select update_time from (select update_time from test_data group by update_time having count(1) > 1) as t) d on d.update_time = a.update_time;

感兴趣的可以用这两个sql测试一下性能
5、扩展
如果感觉一次性删除数据量太大，可以使用limit来限制条数,例如:
delete a from test_data a inner join (select id from (select min(id) as id,update_time from test_data group by update_time having count(1) > 1 ) as t limit 1000 ) d on d.update_time = a.update_time;